Module 01

Reserve the first level headings (#) for the start of a new Module. This will help to organize your portfolio in an intuitive fashion.
Note: Please edit this template to your heart’s content. This is meant to be the armature upon which you build your individual portfolio. You do not need to keep this instructive text in your final portfolio, although you do need to keep module and assignment names so we can identify what is what.

Module 01 portfolio check

The first of your second level headers (##) is to be used for the portfolio content checks. The Module 01 portfolio check has been built for you directly into this template, but will also be available as a stand-alone markdown document available on the MICB425 GitHub so that you know what is required in each module section in your portfolio. The completion status and comments will be filled in by the instructors during portfolio checks when your current portfolios are pulled from GitHub.

  • Installation check
    • Completion status:
    • Comments:
  • Portfolio repo setup
    • Completion status:
    • Comments:
  • RMarkdown Pretty PDF Challenge
    • Completion status:
    • Comments:
  • Evidence worksheet_01
    • Completion status:
    • Comments:
  • Evidence worksheet_02
    • Completion status:
    • Comments:
  • Evidence worksheet_03
    • Completion status:
    • Comments:
  • Problem Set_01
    • Completion status:
    • Comments:
  • Problem Set_02
    • Completion status:
    • Comments:
  • Writing assessment_01
    • Completion status:
    • Comments:
  • Additional Readings
    • Completion status:
    • Comments

Data science Friday

The remaining second level headers (##) are for separating data science Friday, regular course, and project content. In this module, you will only need to include data science Friday and regular course content; projects will come later in the course.

Installation check

Third level headers (###) should be used for links to assignments, evidence worksheets, problem sets, and readings, as seen here.

Use this space to include your installation screenshots.
Sent via e-mail

Portfolio repo setup

Detail the code you used to create, initialize, and push your portfolio repo to GitHub. This will be helpful as you will need to repeat many of these steps to update your porfolio throughout the course.

RMarkdown pretty PDF challenge

Please see Assignment_3.Rmd in DS_Friday folder.

Origins and Earth Systems

Evidence worksheet 01

The template for the first Evidence Worksheet has been included here. The first thing for any assignment should link(s) to any relevant literature (which should be included as full citations in a module references section below).

You can copy-paste in the answers you recorded when working through the evidence worksheet into this portfolio template.

As you include Evidence worksheets and Problem sets in the future, ensure that you delineate Questions/Learning Objectives/etc. by using headers that are 4th level and greater. This will still create header markings when you render (knit) the document, but will exclude these levels from the Table of Contents. That’s a good thing. You don’t want to clutter the Table of Contents too much.

Whitman et al 1998

Learning objectives

Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.

General questions

  • What were the main questions being asked?
    • what is the number of prokaryotes on Earth
    • What is the range of prokaryotes present in specific parts of the Earth
      (none of these questions were directly stated in the paper)
  • What were the primary methodological approaches used?
    • No methods were described in the paper
    • Literature search/survey was done in order to obtain numbers for making estimates of the environments
    • Number calculated (of hypothetical sample) was extrapolated to estimate the number of prokaryotes present
  • Summarize the main results or findings.
    • Cellular production rate is the highest in the open ocean
    • Prokaryotes contribute to approx. 60-100% of the total C found in plants (make up big portion of total C on Earth)
    • Subsurface = major habitat for prokaryotes (greatest in terms of number of prokaryotes present than that found in other parts of the biosphere)
    • Data not completely accurate since mostly estimated by extrapolation (may need to be revised)
    • Turnover time is fastest in domestic mammals (rapidly growing population, although it’s the least in terms of population size)
    • Cellular productivity of subsurface prokaryotes comparable to that of domestic animals
  • Do new questions arise from the results?
    • Yes. The paper estimated the number of prokaryotes. How does the number of bacteria compare to that of archaea?
    • How does climate change affect prokaryote abundance? (Paper was published in 1998; if done again today, what would the results be and how would they compare?)
  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
    • Not much background information nor description of methods were discussed in the paper
    • Did literature survey so each method will possibly have diferences (variability of results), as well as different uncertainties (error rate). Data will vary across reported results.
    • Didn’t really state purpose of the paper, particularly in the abstract; can make purpose more explicit so that it is easier to tie it to the bigger picture and help the reader understand the importance of the study

Problem set 01

Learning objectives:

Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.

Specific questions:

  • What are the primary prokaryotic habitats on Earth and how do they vary with respect to their capacity to support life? Provide a breakdown of total cell abundance for each primary habitat from the tables provided in the text.

    • Aquatic: 1.181 x 1030 cells
      • Freshwater - 1.31 x 1026 cells
      • Saline lakes - 1.0 x 1026 cells
      • Marine - 1.18 x 1029 cells
    • Soil: 2.56 x 1029 cells
    • Subsurface: 3.8 x 1030 cells
      • 10 cm marine
      • 8 m terrestrial
  • What is the estimated prokaryotic cell abundance in the upper 200 m of the ocean and what fraction of this biomass is represented by marine cyanobacterium including Prochlorococcus? What is the significance of this ratio with respect to carbon cycling in the ocean and the atmospheric composition of the Earth?

    • cell density: 5 x 105 cells/mL
    • Prochlorococcus spp. -> cell density: 4 x 104 cells/mL
    • fraction = (4 x 104 cells/mL) / (5 x 105 cells/mL) = 0.08 = 8 % marine cyanobacteria
    • Significant portion of upper oceanic microbes are photosynthetic cyanobacteria needed to fix carbon -> provides C for everyone = high turnover rate
  • What is the difference between an autotroph, heterotroph, and a lithotroph based on information provided in the text?

    • Autotroph: “Self-nourishing”
      - Fix inorganic carbon (CO2)
    • Heterotroph: assimilate organic carbon
    • Lithotroph: use inorganic substrates (carbon) for energy source
  • Based on information provided in the text and your knowledge of geography what is the deepest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this depth?

    • Deep habitats supporting life:
    • Subsurface: Terrestrial = 4 km from sediment (a bit below Mariana’s Trench)
    • Temperature is limiting factor (approx. 125 degrees Celsius); change in temp. is approx. 22 degrees Celsius per km depth.
  • Based on information provided in the text your knowledge of geography what is the highest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this height?

    • 57-77 km above Earth’s surface (a bit above Mt. Everest)
    • Limiting factor is radiation
  • Based on estimates of prokaryotic habitat limitation, what is the vertical distance of the Earth’s biosphere measured in km?

    • 61-81 km = range of Earth’s biosphere
  • How was annual cellular production of prokaryotes described in Table 7 column four determined? (Provide an example of the calculation)

    • (Population) x (Turnover/yr) = Cells/yr
    • Marine Heterotrophs:
      • (3.6 x 1028 cells x 365 days) / 16 turnovers = 8.2 x 1029
  • What is the relationship between carbon content, carbon assimilation efficiency and turnover rates in the upper 200m of the ocean? Why does this vary with depth in the ocean and between terrestrial and marine habitats?

    • C content = amount of organic C stored
    • C assimilation efficiency = inorganic C -> organic C
    • Turnover rate = fraction of C that leaves a reservoir per specified time
    • Carbon efficiency = 20%
      • 5-20 fg C per Cell = 20 x 10-30 Pg/cell
      • (3.6 x 1028 cells) x (20 x 10-30 Pg/cell) = 0.72 Pg C in marine
      • Heterotrophs:
      • 4 x 0.72 = 2.88 Pg/yr
    • (C consumed) / (turnover rate)
    • more sunlight near surface
    • high nutrient = high energy = high turnover rate
  • How were the frequency numbers for four simultaneous mutations in shared genes determined for marine heterotrophs and marine autotrophs given an average mutation rate of 4 x 10-7 per DNA replication? (Provide an example of the calculation with units. Hint: cell and generation cancel out)
    • [( (4 x 10-7 mutation rate per gene) /cell)4 x ( (3.6 x 1028 cells) /16 days) x (1 day/24 hrs)]-1 = 0.42 hrs for 4 mutations
  • Given the large population size and high mutation rate of prokaryotic cells, what are the implications with respect to genetic diversity and adaptive potential? Are point mutations the only way in which microbial genomes diversify and adapt?
    • Point mutations are not the only way. Horizontal and lateral gene transfer can also occur. Conjugation by plasmid, transduction by virus, and transformation by naked virus can all occur. High genetic diversity and adaptive potential may possibly lead to new species.
  • What relationships can be inferred between prokaryotic abundance, diversity, and metabolic potential based on the information provided in the text?
    • Upper 200 m of marine habitat has higher genetic diversity than in domestic animals.
    • Open ocean prokaryotes have much greater adaptive potential than those in soil, subsurface and domestic animals.
    • Greater prokaryotic abundance in unconsolidated subsurface than in soil.

Evidence Worksheet_02 “Life and the Evolution of Earth’s Atmosphere”

Learning objectives:

Comment on the emergence of microbial life and the evolution of Earth systems

Problem set_02 “Microbial Engines”

Learning objectives:

Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.

Specific Questions:

  • What are the primary geophysical and biogeochemical processes that create and sustain conditions for life on Earth? How do abiotic versus biotic processes vary with respect to matter and energy transformation and how are they interconnected?

    • geochemical cycles:
      • tectonics & atmospheric photochemical processes
        • supply substrates and remove products
      • allow elements and molecules to interact and form/break bonds in a cyclical manner
      • Most H2 in mantle escape -> most geochemical reactions based on acid/base reactions (transfer of H+ without e-)
      • volcanism and rock weathering resupply C, S, P
    • biochemical cycles: redox reactions
      • transfer e- and H+ from limited set of elements
      • microbially catalyzed, thermodynamically constrained redox reactions (coupled half-cells)
  • Why is Earth’s redox state considered an emergent property?

    • abiotic (acid/base) and biotic (redox) reactions altered surface redox state of the planet. Microbes took advantage of Earth’s acid/base redox energy.
      –> The evolution of the processes created the redox condition of oceans and atmosphere

    • E.g. thermodynamic equilibrium would lead to eventual depeletion of substrates essential for life -> biological oxidation of Earth driven by photosynthesis by microbes (= energy transduction process not directly dependent on preformed bond energy)

  • How do reversible electron transfer reactions give rise to element and nutrient cycles at different ecological scales? What strategies do microbes use to overcome thermodynamic barriers to reversible electron flow?
    • forward and reverse reactions required to maintain cycle
      • E.g. Methanogenic Archaea: CO2 + H2 -> CH4
        • If H2 is low, then the reverse reaction becomes thermodynamically favourable and is done by another Archaea using the methanogic machinery in reverse.
    • sunlight replenishes energy that is lost
    • cooperation between deifferent species to overcome thermodynamic barriers
  • Using information provided in the text, describe how the nitrogen cycle partitions between different redox ‘niches’ and microbial groups. Is there a relationship between the nitrogen cycle and climate change?

    • nitrogen fixation = the only biological process that makes N2 accessible for sysnthesis of proteins/nucleic acids (N2 -> NH4+) -> biologically irreversible reaction catalyzed by nitrogenase (conserved enzyme), which is inhibited by O2
    • In presence of O2, NH4+ is oxidzied to NO2- to NO3-, all using a specific group of Bacteria or Archaea at each step. This also reduces CO2 to organic C at the same time.
    • In the absence of O2, another set of microbes use NO2- and NO3- as electron acceptors in anaerobic oxidation of organic C and also forms N2 (repeating the N cycle)
    • The N cycle is spatially separated, but forms interdependent e- pool influenced by photosynthetic production of O2 and available organic matter
    • N2O is a greenhouse gas and can contribute to global warming. Also, micrboes in the N cycle convert organic C to CO2, which may impact the C cycle and climate change.
  • What is the relationship between microbial diversity and metabolic diversity and how does this relate to the discovery of new protein families from microbial community genomes?

    • essential pathways: are conserved
    • but as microbial diversity increases, the number of new proteins discovered increases (new number of proteins increase linearly with number of new genomes sequenced)
  • On what basis do the authors consider microbes the guardians of metabolism?
    • vertical and horizontal gene transfer allows the transfer and spread of genes needed for metabolic pathways. As such, the extinction of individual taxonomic units would not result in the extinction of core machines of the pathway.
    • machines, unlike microbes, can’t cycle waste back as substrate

Evidence worksheet_03 “The Anthropocene”

Learning objectives:

Evaluate human impacts on the ecology and biogeochemistry of Earth systems.

General Questions:

  • What were the main questions being asked?

    • Have humans changed the Earth system so much that the geological deposits are distinct from the Holocene & earlier epochs?
    • If so, when did this stratigraphic signal become recognizable worldwide?
  • What were the primary methodological approaches used?
    • There was no method, but they reviewed data from previously published papers
    • They analyzed the published data to summarize the effects humans have on the environment and also to draw a timeline of events
    • Used stratigraphic record to compare to previous epochs
  • Summarize the main results or findings.
    • Anthropocene: period marked by humans that changed Earth’s history (occurred recently)
    • invent of widespread and persistent materials; rapid production of those material leaves identifiable fossil and geochemical records (e.g. aluminum, plastics)
    • modification of sedimentary processes
      • transformed > 50% of Earth’s land for: landfills, urban structures, mining, cultivation, etc.
      • rising sea levels, eutrophication, coral bleaching, etc.
    • N and P in soils doubled due to increased fertilizer use
    • reactive N amount increased by 120% relative to Holocene baseline
    • Sea-level rise
    • Changes in climate and environment exceed Late Holocene changes
    • Radiogenic signatures and radionuclides in sediments/ice
      • excess 14C (suggested as potential marker for start of Anthropocene)
      • 239+240Pu fallout from nuclear weapons testing
    • CO2 above 400ppm
  • Do new questions arise from the results?
    • What are current innovations/strategies employed to slow down/solve the problems?
    • Although the authors listed various changes caused by humans, how would the authors officially define the start of Anthropocene? What is the boundary?
    • What other purposes are there to label this era as Anthropocene other than for historic purposes?
  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

    • the author provided sufficient background information and data to reinforce their points
    • There were many figures, but they only referred to them briefly or not much at all during the discussion. So it felt like the figures were stand alones (although they are related to the text).
    • The figures had good descriptions and were easy to understand.

Module_1 essay

Human beings are innovative creatures that have engineered many machines for technological advancement and manipulation of the environment, be it for better or for worse. That being said, are humans actually at the helm of Spaceship Earth, or are we only capable of living in the presence of a microbial world? We have utilized microbes to our advantage in innumerable processes, ranging from legume cropping in agriculture to fermentation in alcohol production, as well as denitrification in sewage treatment. As a matter of fact, humans have been relying on microbes prior to our acknowledgement of their existence. Microbes have been present on Mother Earth long before the existence of Homo sapiens and have established the distinctive, phenotypic features of Earth’s atmosphere suitable for human life. In addition, microbes are an essential part of Earth’s biogeochemical cycles, which are required to sustain the lives of the many other organisms which benefit the human species. Thus, humans live in a microbial world and would not be able to survive otherwise without the metabolic processes provided by microbes. Microbes, on the other hand, would easily thrive in a human-less world.

Microbes have existed long before the birth of the human race and can easily continue to thrive even in the absence of human intervention. To begin with, it is important to discuss the history of microbial life in order to understand and appreciate the influence these microorganisms have on Earth and her environment. According to geological evidence, microbial activity has been present on Earth for at least 3.5 billion years (Gyr) (5). Evidence derived from stable-isotope fractionation of 3.5-Gyr stromatolites suggested possible sulfate reduction methanogenesis activity (2). In addition, molecular fossils from late-Archaean rocks (i.e. 3.0 - 2.5 Gyr ago) provided evidence of preserved biomarkers, such as 2??-methylhopanes from cyanobacteria, allowing for further inference of the types of microbes and metabolic processes present during that period (5). Furthermore, microbes can influence their surrounding environment through the production of metabolic products and byproducts required for survival. An example would be the nitrogen cycle, where multiple species of microbes interact together, forming a nutrient cycle. These interdependent relationships where a species of microbes utilize the waste of another species as substrate to generate energy for growth and reproduction have existed long before human intervention, and would continue to exist due the self-sustainability of the symbiosis that exists between these microbial colonies (2). Despite many volcanic eruptions, meteorite impacts and extinction events during Earth’s early years, many diverse microbes persisted and continue to flourish today. In addition to being diverse and possessing remarkable adaptability, microbes have the ability to perform horizontal gene transfer (HGT) of genes to other organisms in order to preserve essential metabolic pathways (2). If the extinction, per se, of the original taxonomic unit were to occur, the dispersion of essential core genes by HGT ensures the conservation of core metabolic pathways due to the selective pressure of the environment. In other words, non-essential and irrelevant genes are lost, while those that are important are preserved. Microbes have a long history on Earth and have played an important role in shaping the planet which we currently inhabit.

The human race as a whole is dependent on microbial activity for survival on Earth. First, microbes have established the current phenotypic atmosphere of Earth, allowing human existence. The atmosphere of the early Earth was anoxic and had a completely different composition than the atmosphere we have today (4). Abiotic and biotic factors such as volcanic eruptions and methane-producing microbes contributed to an atmosphere rich in carbon dioxide and methane (2). However, 2.3 Gyr ago, there was an initial rise in oxygen largely due to oxygenic photosynthesis performed by cyanobacteria (4). Although the rise in oxygen concentration led to a mass extinction known as the Great Oxygenation Event, the oxic atmosphere allowed the evolution of many aerobic organisms and brought about the species which would eventually be known as Homo sapiens (3).

In addition to establishing an atmosphere suitable for humans, microbes were capable of taking advantage of the abiotic geochemical reactions produced by Earth and have co-evolved to form biogeochemical cycles (2). These biogeochemical cycles are essential for the production of major elements required for building biological macromolecules such as proteins, lipids and nucleic acids and thus, are crucial for sustaining human life (2). Biogeochemical cycles are largely dependent on reduction-oxidation (redox) reactions mediated by microbes. These important cycles create fluxes of major essential elements such as hydrogen (H), carbon (C), nitrogen (N), oxygen (O) and sulfur (S) that are required for the synthesis of all biological macromolecules and ultimately, human survival (2). An example of a major biogeochemical cycle is the nitrogen cycle. The N cycle is formed by microbes and is essential for providing living organisms access to nitrogen, needed for the synthesis nucleic acid and proteins (1). Although approximately 78% of Earth’s atmosphere is composed of molecular nitrogen (N2), atmospheric nitrogen is inert and not readily available for human use (5). Nitrogen fixation is the only biological process currently available which allows the conversion of N2 to ammonium (NH4+), providing accessible nitrogen to humans, animals and non-nitrogen fixing plants altogether (2). Certain microbes, such as cyanobacteria, express nitrogenases and can catalyze the irreversible reaction to fix nitrogen (5). Following nitrogen fixation, nitrification and denitrification by certain microbes occur to convert the available nitrogen back into N2 to maintain the metabolic cycle and a balance between atmospheric nitrogen and available nitrogen (2). With the rapid growth of the human population, food becomes scarce and methods for increasing provisions are needed. Taking advantage of the endosymbiotic nitrogen-fixing microbes found in legumes, humans have developed sustainable approaches for agriculture, such as legume cropping in lieu of synthetic fertilizers, in order to provide accessible nitrogen to crops (1). Thus, microbes have played and continue to play an integral role in the storybook of human survival.

In conclusion, microbes have played a vital part in providing the appropriate environment required for human existence and continue to play an integral part for human life on Earth. Not only have microbes lived long before the appearance of Homo sapiens, microbes have evolved and adapted to the harsh geological environment-be it by natural causes or by human influence-and have changed the phenotype of Earth’s atmosphere to establish a habitable place for humans today. Lastly, microbial metabolic pathways are necessary to maintain biogeochemical cycles. This allows microbes to catalyze thermodynamically constrained redox reactions of the cycles, allowing their own survival and that of other living organisms.

Module 01 references

Utilize this space to include a bibliography of any literature you want associated with this module. We recommend keeping this as the final header under each module.

  • Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578–6583. PMC33863
  • Achenbach, J. (2012). Spaceship Earth: A new view of environmentalism. Washington Post (January 2, 2012).
  • Falkowski, P. G., Fenchel, T., & Delong, E. F. (2008). The microbial engines that drive Earth’s biogeochemical cycles. science, 320(5879), 1034-1039. link
  • Canfield, D. E., Glazer, A. N., & Falkowski, P. G. (2010). The evolution and future of Earth’s nitrogen cycle. science, 330(6001), 192-196. link
  • Kasting, J. F., & Siefert, J. L. (2002). Life and the evolution of Earth’s atmosphere. Science, 296(5570), 1066-1068. link
  • Nisbet, E. G., & Sleep, N. H. (2001). The habitat and nature of early life. Nature, 409(6823), 1083. link
  • Leopold, A. (2014). The land ethic. In The Ecological Design and Planning Reader (pp. 108-121). Island Press, Washington, DC. link
  • Falkowski, P., Scholes, R. J., Boyle, E. E. A., Canadell, J., Canfield, D., Elser, J., … & Mackenzie, F. T. (2000). The global carbon cycle: a test of our knowledge of earth as a system. science, 290(5490), 291-296. link
  • Kallmeyer, J., Pockalny, R., Adhikari, R. R., Smith, D. C., & D’Hondt, S. (2012). Global distribution of microbial abundance and biomass in subseafloor sediment. Proceedings of the National Academy of Sciences, 109(40), 16213-16216. link
  • Rockström, J., Steffen, W., Noone, K., Persson, Å., Chapin III, F. S., Lambin, E. F., … & Nykvist, B. (2009). A safe operating space for humanity. nature, 461(7263), 472. link
  • Schrag, D. P. (2012). Geobiology of the Anthropocene. Fundamentals of geobiology, 425-436.link
  • Waters, C. N., Zalasiewicz, J., Summerhayes, C., Barnosky, A. D., Poirier, C., Galuszka, A., … & Jeandel, C. (2016). The Anthropocene is functionally and stratigraphically distinct from the Holocene. Science, 351(6269), aad2622. link
  • Zehnder, A. J. (1988). Biology of anaerobic microorganisms. John Wiley and Sons Inc.. link

References for Essay:
1. Canfield, DE, Glazer, AN, Falkowski, PG. 2010. The Evolution and Future of Earth’s Nitrogen Cycle. Science. 330:192-196. doi: 10.1126/science.1186120.
2. Falkowski, PG, Fenchel, T, Delong, EF. 2008. The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science. 320:1034-1039. doi: 10.1126/science.1153213.
3. Fischer, WW, Hemp, J, Valentine, JS. 2016. How did life survive Earth’s great oxygenation? Curr. Opin. Chem. Biol. 31:166-178. doi: 10.1016/j.cbpa.2016.03.013.
4. Kasting, J, Siefert, J. 2003. Life and the evolution of Earth’s atmosphere (vol 296, pg 1066, 2002). Science. 299:1015-1015.
5. Sleep, NH, Nisbet, EG. 2001. The habitat and nature of early life. Nature. 409:1083-1091. doi: 10.1038/35059210.

Module 02

Remapping the Body of the World

Problem set_03 “Metagenomics: Genomic Analysis of Microbial Communities”

Learning objectives:

Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.
(Reminders for how to format links, etc in RMarkdown are in the RMarkdown Cheat Sheets)

Specific Questions:

  • How many prokaryotic divisions have been described and how many have no cultured representatives (microbial dark matter)?
    • Solden et al. (2016). “The Bright Side of Microbial Dark Matter: Lessons Learned from the Uncultivated Majority”. -> 89 Bacterial and 20 Archaeal phyla recogized by small rRNA subunits
      • only 0.1 - 1% of microbes in environment have been cultured
    • Youssef et al. (2015). “Assessing the Global Phylum Level Diversity Within the Bacterial Domain: A Review”
      • 70 bacterial phyla with no cultured representatives (this number may be greater)
      • previously confined to pure cultures, found 12 bacterial phyla with cultured representatives: Actinobacteria, Bacteroidetes, Chlamydiae, Chlorobi, Chloroflexi, Cyanobacteria, Thermi (Deinococcus-Thermus), Firmicutes, Planctomycetes, Proteobacteria, Spirochaetes, Thermotogae
  • How many metagenome sequencing projects are currently available in the public domain and what types of environments are they sourced from?
    • From EBI Metagenomics (Europe)
      • 1486 public projects
      • 86201 samples
      • soil, marine, grassland, fecal, rumens, agriculture, etc.
    • JGI (U.S.)
      • 36715 public projects
  • What types of on-line resources are available for warehousing and/or analyzing environmental sequence information (provide names, URLS and applications)?
  • What is the difference between phylogenetic and functional gene anchors and how can they be used in metagenome analysis?
    • Phylogenetic anchors:
      • link other unknown genes within genome to a taxon
      • collection of genes can be placed into bins of discrete taxonomic groups
      • only one copy per cell; ensures no HGT, and genes only appear once in tree
      • Looks at size of bins for relative abundance of taxa
    • Functional anchors:
      • tells what the cell does
      • search for gene that codes for protin involved in the terminal steps of a pathway
      • can count abundance of phylogenetic anchor
  • What is metagenomic sequence binning? What types of algorithmic approaches are used to produce sequence bins? What are some risks and opportunities associated with using sequence bins for metabolic reconstruction of uncultivated microorganisms?
    • binning: grouping reads or contigs and assigning them to operational taxonomic units (OTU)
      • Taxonomy dependent - classifies DNA fragments by comparing to reference database
      • Taxonomy independent - reference free binning by clustering similar features (e.g. GC content, etc.)
    • sequence composition based binning -> genomic signatures
      • Interpolated Markov Model (IMM)
    • abundance based binning
      • Expectation-Maximization (EM) algorithm -Hybrid binning
      • Lander-Waterman model Sedlar et al. (2016) “Bioinformatics strategies for taxonomy independent binning and visualization of sequence in shotgun metagenomics”
  • Is there an alternative to metagenomic shotgun sequencing that can be used to access the metabolic potential of uncultivated microorganisms? What are some risks and opportunities associated with this alternative?
    • Douterelo et al. (2014). “Methodological approaches for studying the microbial ecology of drinking water distribution systems.”
    • Clone dependent shotgun library
      • Find functions of proteins from the gene of uncultivable microbes by cloning gene fragments into an expression host
      • Limitation: gene may not be compatible with the host
    • Pacific Bioscience
    • Nanopore
    • Metatranscriptomes
      • RT-qPCR: -high sensitivity, can run multiple samples + analyze different genes at once
        • quantitative representation of changes in gene expression due to different treaments/controls
        • Limitations:
          • RNA is highly unstable
          • DNA extracted needs to be pure
      • Functional Microarrays:
        • Overall assessment of gene expression within a microbial community
        • simultaneous evaluation of many mRNA (rapid)
        • Limitations:
          • liquid chromatography/2D gels for protein separation can be tedious
          • Must use LC-MS (Liquid chromatography-mass spectrometry) to characterize samples

Evidence worksheet_04 “Bacterial Rhodopsin Gene Expression”

Learning objectives:

  • Discuss the relationship between microbial community structure and metabolic diversity

  • Evaluate common methods for studying the diversity of microbial communities

  • Recognize basic design elements in metagenomic workflows

General Questions:

  • What were the main questions being asked?
    • To describe and understand more about proteorhodopsins (PRs), photosystem genetics and biochemistry (foud that proteorhodopsin isn’t needed, but provides extra energy in form of ATP)
    • What is the physiological basis of light-activated growth stimulation and the function of the various PR photosystems?
    • Can we recover a functional PR-system in a metagenomic screening context?
    • Can a single genetic event (HGT/LGT operon) confer PR-system functionality?
  • What were the primary methodological approaches used?
    • Created fosmid library (gene library) -> clone in E. coli -> Screen by colour change -> extract carotenoid gene
    • Sequenced to compreltion using Tn5-Seq
      • Use copy control and induced with arabinose
        • slight colour shift may have been hard to discern (so 3/12000 hits)
    • Proton-Pumping experiment and ATP measurements
    • HPLC Analysis of metabolites
  • Summarize the main results or findings.
    • Only 6 genes are required to enable light-activated proton translocation and photophosphorylation
      • these 6 genes found in marine bacterial taxa
        • these are necessary and sufficient for complete synthesis and assembly of a fully functional PR photoprotein in E. coli
    • demonstrated that illumination of cells expressing a native marine bacterial PR photosystem generates a PMF that drives celluar ATP synthesis
    • a single genetic event can result in the acquisition of phototrophic capabilities in chemoorganotrophic microorganisms
  • Do new questions arise from the results?
    • Paper mentioned that light stimulation/yield in Pelagibacter ubique in previous study had a negative result that may be due to the influence of various factors in seawater. Are there perhaps other reasons it didn’t work on P. ubique?
    • Why were only 3/12280 clone identified?
  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
    • Sufficient background information was provided
    • The figures and figure font were a bit too small to read without having to zoom in 300%

Module 02 references

Utilize this space to include a bibliography of any literature you want associated with this module. We recommend keeping this as the final header under each module.

  • Madsen, E. L. (2005). Identifying microorganisms responsible for ecologically significant biogeochemical processes. Nature Reviews Microbiology, 3(5), 439. link
  • Taupp, M., Mewis, K., & Hallam, S. J. (2011). The art and design of functional metagenomic screens. Current opinion in biotechnology, 22(3), 465-472. link
  • Martinez, A., Bradley, A. S., Waldbauer, J. R., Summons, R. E., & DeLong, E. F. (2007). Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. Proceedings of the National Academy of Sciences, 104(13), 5590-5595. link
  • Wooley, J. C., Godzik, A., & Friedberg, I. (2010). A primer on metagenomics. PLoS computational biology, 6(2), e1000667. link

Module 03

Microbial Species Concepts


Problem set_04 “Fine-scale phylogenetic architecture”"

Learning objectives:

  • Gain experience estimating diversity within a hypothetical microbial community

Part 1: Description and enumeration

Obtain a collection of “microbial” cells from “seawater”. The cells were concentrated from different depth intervals by a marine microbiologist travelling along the Line-P transect in the northeast subarctic Pacific Ocean off the coast of Vancouver Island British Columbia.

Sort out and identify different microbial “species” based on shared properties or traits. Record your data in this Rmarkdown using the example data as a guide.

Once you have defined your binning criteria, separate the cells using the sampling bags provided. These operational taxonomic units (OTUs) will be considered separate “species”. This problem set is based on content available at What is Biodiversity.

For example, load in the packages you will use.

#To make tables
library(kableExtra)
library(knitr)
#To manipulate and plot data
library(tidyverse)

Then load in the data. You should use a similar format to record your community data.

example_data1 = data.frame(
  number = c(1,2,3),
  name = c("lion", "tiger", "bear"),
  characteristics = c("brown cat", "striped cat", "not a cat"),
  occurences = c(2, 4, 1)
)

Finally, use these data to create a table.

example_data1 %>% 
  kable("html") %>%
  kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
number name characteristics occurences
1 lion brown cat 2
2 tiger striped cat 4
3 bear not a cat 1

For your community:

  • Construct a table listing each species, its distinguishing characteristics, the name you have given it, and the number of occurrences of the species in the collection.
  • Ask yourself if your collection of microbial cells from seawater represents the actual diversity of microorganisms inhabiting waters along the Line-P transect. Were the majority of different species sampled or were many missed?
candy_counts = read.table(file="Candy_community.csv", header=TRUE, sep=",", na.strings=c("NAN", "NA", "."))
candy_table = candy_counts %>% kable("html")

#The collection of microbial cells from seawater does not represent the actual diversity of microorganisms inhabtigin waters along the Line-P transect. Many were missed, especially if the abundance was low for those species in the first place.

Part 2: Collector’s curve

To help answer the questions raised in Part 1, you will conduct a simple but informative analysis that is a standard practice in biodiversity surveys. This analysis involves constructing a collector’s curve that plots the cumulative number of species observed along the y-axis and the cumulative number of individuals classified along the x-axis. This curve is an increasing function with a slope that will decrease as more individuals are classified and as fewer species remain to be identified. If sampling stops while the curve is still rapidly increasing then this indicates that sampling is incomplete and many species remain undetected. Alternatively, if the slope of the curve reaches zero (flattens out), sampling is likely more than adequate.

To construct the curve for your samples, choose a cell within the collection at random. This will be your first data point, such that X = 1 and Y = 1. Next, move consistently in any direction to a new cell and record whether it is different from the first. In this step X = 2, but Y may remain 1 or change to 2 if the individual represents a new species. Repeat this process until you have proceeded through all cells in your collection.

For example, we load in these data.

example_data2 = data.frame(
  x = c(1,2,3,4,5,6,7,8,9,10),
  y = c(1,2,3,4,4,5,5,5,6,6)
)

And then create a plot. We will use a scatterplot (geom_point) to plot the raw data and then add a smoother to see the overall trend of the data.

ggplot(example_data2, aes(x=x, y=y)) +
  geom_point() +
  geom_smooth() +
  labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'

For your sample:

  • Create a collector’s curve for your sample (not the entire original community).
  • Does the curve flatten out? If so, after how many individual cells have been collected?
  • What can you conclude from the shape of your collector’s curve as to your depth of sampling?
candy_count_collector = data.frame(
  x = c(1,13,14,15,27,37,49,50,52,55,58,67,76,93,104,113,122,127,128,129,137,138,139,141,150,154,160,164,165,167,171,177,182,184),
  y = c(1,2,3,4,4,5,6,6,6,7,7,7,8,8,8,8,8,8,8,9,9,9,10,10,10,10,10,10,11,11,11,11,11,11)
)

ggplot(candy_count_collector, aes(x=x, y=y)) +
  geom_point() +
  geom_smooth() +
  labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'

# The curve flattens out at around 10 species, when the number of cells collected is 85.
# In the beginning and towards the end, sampling individuals increase the species moreso than in the middle. This is probably because at the start there is more chance to get cells that aren't of the same species. Likewise, near the end when there are only a few cells remaining, those cells from species that have few individuals will eventually be sampled. In the middle, there is a greater likelihood of sampling cells from the same species, which explains the plateaur near the centre of the curve.

Part 3: Diversity estimates (alpha diversity)

Using the table from Part 1, calculate species diversity using the following indices or metrics.

Diversity: Simpson Reciprocal Index

\(\frac{1}{D}\) where \(D = \sum p_i^2\)

\(p_i\) = the fractional abundance of the \(i^{th}\) species

For example, using the example data 1 with 3 species with 2, 4, and 1 individuals each, D =

species1 = 2/(2+4+1)
species2 = 4/(2+4+1)
species3 = 1/(2+4+1)

1 / (species1^2 + species2^2 + species3^2)
## [1] 2.333333
#Sample

Gummy_bear = 23/184
Gummy_sour_bears = 1/184
Gummy_Rods = 48/184
Large_gummy = 0/184
Gummy_sour_swirls = 1/184
Gummy_spiders = 0/184
Gummy_cokes= 0/184
Gummy_string = 0/184
Small_brick = 4/184
Large_brick = 0/184
Skittles = 49/184
MM = 49/184
Twizzlers = 0/184
Kisses = 1/184
Gummy_balls = 6/184
Gummy_fruit = 1/184
Mutated = 1/184

Simpsons = 1/ (Gummy_bear^2 + Gummy_sour_bears^2 + Gummy_Rods^2 + Large_gummy^2 + Gummy_sour_swirls^2 + Gummy_spiders^2 + Gummy_cokes^2 + Gummy_string^2 + Small_brick^2 + Large_brick^2 + Skittles^2 + MM^2 + Twizzlers^2 + Kisses^2 + Gummy_balls^2 + Gummy_fruit^2 + Mutated^2)
#Community

C_Gummy_bear = 105/793
C_Gummy_sour_bears = 3/793
C_Gummy_Rods = 174/793
C_Large_gummy = 2/793
C_Gummy_sour_swirls = 3/793
C_Gummy_spiders = 6/793
C_Gummy_cokes= 3/793
C_Gummy_string = 7/793
C_Small_brick = 15/793
C_Large_brick = 3/793
C_Skittles = 192/793
C_MM = 222/793
C_Twizzlers = 14/793
C_Kisses = 16/793
C_Gummy_balls = 24/793
C_Gummy_fruit = 2/793
C_Mutated = 2/793

C_Simpsons = 1/ (C_Gummy_bear^2 + C_Gummy_sour_bears^2 + C_Gummy_Rods^2 + C_Large_gummy^2 + C_Gummy_sour_swirls^2 + C_Gummy_spiders^2 + C_Gummy_cokes^2 + C_Gummy_string^2 + C_Small_brick^2 + C_Large_brick^2 + C_Skittles^2 + C_MM^2 + C_Twizzlers^2 + C_Kisses^2 + C_Gummy_balls^2 + C_Gummy_fruit^2 + C_Mutated^2)

The higher the value is, the greater the diversity. The maximum value is the number of species in the sample, which occurs when all species contain an equal number of individuals. Because the index reflects the number of species present (richness) and the relative proportions of each species with a community (evenness), this metric is a diveristy metric. Consider that a community can have the same number of species (equal richness) but manifest a skewed distribution in the proportion of each species (unequal evenness), which would result in different diveristy values.

  • What is the Simpson Reciprocal Index for your sample?
    • 4.401
  • What is the Simpson Reciprocal Index for your original total community?
    • 4.881
Richness: Chao1 richness estimator

Another way to calculate diversity is to estimate the number of species that are present in a sample based on the empirical data to give an upper boundary of the richness of a sample. Here, we use the Chao1 richness estimator.

\(S_{chao1} = S_{obs} + \frac{a^2}{2b})\)

\(S_{obs}\) = total number of species observed a = species observed once b = species observed twice or more

So for our previous example community of 3 species with 2, 4, and 1 individuals each, \(S_{chao1}\) =

3 + 1^2/(2*2)
## [1] 3.25
#sample
11 + 5^2/(2*6)
## [1] 13.08333
#Community
17 + 5^2/(2*12)
## [1] 18.04167
  • What is the chao1 estimate for your sample?
    • 13.08
  • What is the chao1 estimate for your original total community?
    • 18.04

Part 4: Alpha-diversity functions in R

We’ve been doing the above calculations by hand, which is a very good exercise to aid in understanding the math behind these estimates. Not surprisingly, these same calculations can be done with R functions. Since we just have a species table, we will use the vegan package. You will need to install this package if you have not done so previously.

library(vegan)

First, we must remove the unnecesary data columns and transpose the data so that vegan reads it as a species table with species as columns and rows as samples (of which you only have 1).

example_data1_diversity = 
  example_data1 %>% 
  select(name, occurences) %>% 
  spread(name, occurences)

example_data1_diversity
##   bear lion tiger
## 1    1    2     4
Candycount_sample = read.table(file="Candycount_sample.csv", header=TRUE, sep=",", na.strings=c("NAN", "NA", "."))

candy_diversity =
  Candycount_sample %>%
  select(Name, Count) %>%
  spread(Name, Count)

candy_diversity
##   Gummy_balls Gummy_bear Gummy_cokes Gummy_fruit Gummy_Rods
## 1           6         23           0           1         48
##   Gummy_sour_bears Gummy_sour_swirls Gummy_spiders Gummy_string Kisses
## 1                1                 1             0            0      1
##   Large_brick Large_gummy MM Mutated Skittles Small_brick Twizzlers
## 1           0           0 49       1       49           4         0
Candycount_community= read.table(file="Candycount_community.csv", header=TRUE, sep=",", na.strings=c("NAN", "NA", "."))

candy_diversityC =
  Candycount_community %>%
  select(Name, Count) %>%
  spread(Name, Count)

candy_diversityC
##   C_Gummy_balls C_Gummy_bear = 105/796 C_Gummy_cokes C_Gummy_fruit
## 1            24                    105             3             2
##   C_Gummy_Rods C_Gummy_sour_bears C_Gummy_sour_swirls C_Gummy_spiders
## 1          174                  3                   3               6
##   C_Gummy_string C_Kisses C_Large_brick C_Large_gummy C_MM C_Mutated
## 1              7       16             3             2  222         2
##   C_Skittles C_Small_brick C_Twizzlers
## 1        192            15          14

Then we can calculate the Simpson Reciprocal Index using the diversity function.

diversity(example_data1_diversity, index="invsimpson")
## [1] 2.333333
diversity(candy_diversity, index="invsimpson")
## [1] 4.401456
diversity(candy_diversityC, index="invsimpson")
## [1] 4.881042

And we can calculate the Chao1 richness estimator (and others by default) with the the specpool function for extrapolated species richness. This function rounds to the nearest whole number so the value will be slightly different that what you’ve calculated above.

specpool(example_data1_diversity)
##     Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All       3    3       0     3        0     3    3       0 1
specpool(candy_diversity)
##     Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All      11   11       0    11        0    11   11       0 1
specpool(candy_diversityC)
##     Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All      17   17       0    17        0    17   17       0 1

In Project 1, you will also see functions for calculating alpha-diversity in the phyloseq package since we will be working with data in that form.

For your sample:

  • What are the Simpson Reciprocal Indices for your sample and community using the R function?
    • They are the same as the ones we calculated beforehand (4.401 and 4.881 for sample and community respectively).
  • What are the chao1 estimates for your sample and community using the R function?
    • They are 11 and 17 for sample and community, respectively. These do not match, as in the csv files created for these, only the total number of cells (undistinguished by colour) are given for the count of each species.
    • Verify that these values match your previous calculations.

Part 5: Concluding activity

If you are stuck on some of these final questions, reading the Kunin et al. 2010 and Lundin et al. 2012 papers may provide helpful insights.

  • How does the measure of diversity depend on the definition of species in your samples?
    • Depending on how we decide to group the candy types (based on characteristics such as colour or shape), a species can be made bigger or smaller. The more characteristics we separate the candies by, the higher the diversity we will have (since more species will be created).
  • Can you think of alternative ways to cluster or bin your data that might change the observed number of species?
    • We could group the candies based on colour or size, rather than brand.
  • How might different sequencing technologies influence observed diversity in a sample?
    • Different sequencing technologies have different post-sequencing processing procedures (such as how they are assembled) and quality controls. These can introduce sequencing errors on different levels, such as biases generated from using different models for calculations.

Project_1

Team 1: Albert Chang (26234147), Alison Fong (33399149), Karen Lau (16524143), Yaqian Luo (59751503), Jessica Ngo (31837131), Peter (Kiet) Truong (36645133)

Abstract

To study the diversity and biochemical responses of microbial communities in the context of oxygen minimum zones (OMZs), Saanich Inlet was used as a model ecosystem from which water samples were collected at seven major depths spanning the oxycline. A metagenomic study was conducted in which genomic DNA was extracted from the water samples, PCR amplified, assembled into contiguous sequences, and processed into operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) using mothur and QIIME2 pipelines. Based on OTU and ASV results, we chose to focus in on Cyanobacteria as our taxon of interest. Both mothur and QIIME2 data produced alpha diversity values based on Shannon’s diversity index that suggest a decrease of Cyanobacterial abundance as depth increases. In addition, Cyanobacterial abundance significantly differs across depth and oxygen concentration according to both mothur and QIIME2 data; specifically, abundance significantly decreases at deeper depths and in environments with lower concentrations of oxygen. Within the Cyanobacteria phylum, there are 15 OTUs and 51 ASVs across all samples. Abundance of five OTUs from the mothur pipeline showed significant changes across both depth and oxygen. In contrast, abundance of none of the ASVs from the QIIME2 pipeline showed significant changes across depth, although 17 ASVs showed significant changes across oxygen concentrations. An increase of Cyanobacterial abundance at shallow depths may be explained in part by their ability to absorb red and orange light at the upper boundaries of the water column. This is also supported by the significant change in oxygen and chlorophyll A concentrations across the depth profile. Another explanation for low Cyanobacterial abundance at lower depths may be due to changes in temperature; where temperature drops below 10oC at 100m, Cyanobacterial growth stops completely.To study the diversity and biochemical responses of microbial communities in the context of oxygen minimum zones (OMZs), Saanich Inlet was used as a model ecosystem from which water samples were collected at seven major depths spanning the oxycline. A metagenomic study was conducted in which genomic DNA was extracted from the water samples, PCR amplified, assembled into contiguous sequences, and processed into operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) using mothur and QIIME2 pipelines. Based on OTU and ASV results, we chose to focus in on Cyanobacteria as our taxon of interest. Both mothur and QIIME2 data produced alpha diversity values based on Shannon’s diversity index that suggest a decrease of Cyanobacterial abundance as depth increases. In addition, Cyanobacterial abundance significantly differs across depth and oxygen concentration according to both mothur and QIIME2 data; specifically, abundance significantly decreases at deeper depths and in environments with lower concentrations of oxygen. Within the Cyanobacteria phylum, there are 15 OTUs and 51 ASVs across all samples. Abundance of five OTUs from the mothur pipeline showed significant changes across both depth and oxygen. In contrast, abundance of none of the ASVs from the QIIME2 pipeline showed significant changes across depth, although 17 ASVs showed significant changes across oxygen concentrations. An increase of Cyanobacterial abundance at shallow depths may be explained in part by their ability to absorb red and orange light at the upper boundaries of the water column. This is also supported by the significant change in oxygen and chlorophyll A concentrations across the depth profile. Another explanation for low Cyanobacterial abundance at lower depths may be due to changes in temperature; where temperature drops below 10oC at 100m, Cyanobacterial growth stops completely.

Introduction

Oxygen minimum zones (OMZs) are areas in the ocean where dissolved oxygen concentrations fall below 20 \(\mu\)M (1). Due to temperature increases and other effects caused by global warming, OMZs are expanding at a notable rate. Saanich Inlet, a seasonally anoxic fjord off the coast of British Columbia, is a model ecosystem for studying the diversity and biochemical responses of microbial communities to the hypoxic environments commonly observed in OMZs (1, 2). In particular, Saanich Inlet has been used to model the metabolic coupling and symbiotic interactions that occur in OMZs (3). The inlet undergoes recurring cycles of water column stratification and deep water renewal, rendering it a model ecosystem for studying microbial responses to changes in ocean deoxygenation levels (4). Increased levels of primary productivity in ocean surfaces during the spring season, as well as the limited mixing which occurs between the basin and surface waters both result in the development of an anoxic body of water with increasing depth in the Inlet (2). These anoxic regions become populated with chemolithoautotrophs, and eventually lead to a decrease in aerobically respiring organisms found deeper within these zones. Past studies have demonstrated that these kinds of metabolic shifts generally lead to a loss of nitrogen along with the production of greenhouse gases, most notably methane (CH4) and nitrous oxide (N2O) (1).

In order to investigate the changes that occur in these OMZs, water samples of various depths were collected from Saanich Inlet. Genomic DNA was extracted from these to conduct a metagenomics study, allowing to overcome the barrier of uncultivability of these samples and enable a more thorough exploration of the relationship which exists between the microbes and their communities based on genetic distribution of metabolic processes (5, 6, 7, 8). The extracted DNA is sequenced to generate raw data, which can then be assembled into contiguous sequences. These contigs generated by amplicon sequencing are then compared to a sequence database to determine the microbial taxa present in the environment at each water depth. This involves processing the sequencing data, and there currently exists two methods for this type of data analysis: operational taxonomic units (OTUs) and amplicon sequence variants (ASVs). OTU based pipelines work based on clustering reads which differ by less than a fixed dissimilarity threshold (9). This allows more data to be kept, although some may not be representative of the actual taxa in the community. On the contrary, ASV based pipelines resolve these sequence variants by inferring biological sequences in the sample prior to amplification and possible sequencing errors, and are able to distinguish variants which differ by even one nucleotide (9). This treats each ASV as individual species, though has the potential to discard more data and bias towards sequences that are less error-prone.

The objective of this paper is to analyze the data generated from both OTU and ASV pipelines in order to decide which produces more logical inferences, and ultimately determining which pipeline would be preferred to carry out future analysis of collected water samples. The taxonomy of interest which was selected for this comparison was the phylum Cyanobacteria. Cyanobacteria was selected as there are sufficient numbers of OTUs and ASVs to make sound comparisons, but not so much so that computation-wise it would be infeasible.

Methods

Sampling

Samples were obtained on Saanich Inlet Cruise 72 and taken from seven major depths spanning the oxycline: 10, 100, 120, 135, 150, 165 and 200 m. Waters were filtered, and genomic DNA was extracted. Further sampling details can be found in (3).

DNA Sequencing

Samples were PCR amplified using the 515F and 808R primers, then sequenced according to the standard operating protocol on an Illumina MiSeq platform with Phred33 quality scores.

Data Processing

Sequences were processed using either mothur or QIIME2 as follows:

Mothur Pipeline: mothur was used to clean-up the data. In brief, paired end reads were combined into contigs using their overlapping regions. Low quality sequences, useless sequence data, chimeric sequences and singletons were removed. OTUs were then determined at 97% similarity. OTUs were classified using the SILVA databases, and the taxonomies for each OTU were condensed. The OTU table, taxonomy data and sample metadata were subsequently cleaned up and combined into a phyloseq object.

QIIME2 Pipeline: Demultiplexed sequences were imported into QIIME as manifest reads. QIIME was used to clean up the data along with ASV determination in one step. Sequence quality was visually evaluated, and sequence quality trimming was conducted. All other trimming/filtering parameters were left as default. ASV determination was completed using the Dada2 protocol. ASV classification was completed using the Silva version 119 database at 99% similarity. The ASV table, taxonomy data and sample metadata were subsequently cleaned up and combined into a phyloseq object.

Data Analysis

The aforementioned phyloseq objects were imported into R version 3.4.3 (Windows) or 1.1.383 (Mac). The tidyverse, phyloseq, magrittr, knitr and cowplot packages were loaded and used to complete the data analysis. Data was piped into linear models and ANOVA tests to determine statistical significance at the 95% confidence level.

Environment setup and Data Cleaning
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract
## Warning: package 'cowplot' was built under R version 3.4.4
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
## 
##     ggsave
## You set `rngseed` to FALSE. Make sure you've set & recorded
##  the random seed of your session for reproducibility.
## See `?set.seed`
## ...
## 614OTUs were removed because they are no longer 
## present in any sample after random subsampling
## ...
## You set `rngseed` to FALSE. Make sure you've set & recorded
##  the random seed of your session for reproducibility.
## See `?set.seed`
## ...
## 6OTUs were removed because they are no longer 
## present in any sample after random subsampling
## ...

Results

  1. How does microbial community structure change with depth and oxygen concentration?

Alpha-diversity comparison:
Mothur
As shown in Figure 1, alpha-diversity by Shannon’s diversity index based on mothur shows an overall decreasing trend as depth increases. Specifically, a stable trend is observed from depths of 0-100m, starts decreasing at 100-150m and stabilizes at 150-200m. Figure 1 also shows an overall increasing trend as oxygen increases. Specifically, Shannon’s diversity index increases from 2.3-4.25 at oxygen concentrations of 0-40\(\mu\)M and decreases from 4.25-3.9 at oxygen concentrations of 40-220µM. It was also observed that Shannon’s diversity index is higher in oxic conditions (3.84 ± 0.45) than anoxic conditions (2.39 ± 0.07).

QIIME2
The trends of Shannon’s diversity index across depth and oxygen based on QIIME2 data are similar to those of mothur data. However, QIIME2 pipeline produces higher Shannon’s diversity index than mothur does. Shannon’s diversity index increases from 2.9-5.2 at oxygen concentrations of 0-40\(\mu\)M and decreases from 5.2-5.1 at oxygen concentrations of 40-200m. Figure 2 shows that Shannon’s diversity index is higher in oxic conditions (4.80 ± 0.43) than anoxic conditions (3.15 ± 0.18).

Alpha-diversity of mothur data
## `geom_smooth()` using method = 'loess'

Table 1. Average and standard deviation of alpha-diversity by oxic/anoxic with mothur data
Statistic Oxic Anoxic
Average 3.8401008 2.3884700
Standard deviation 0.4523233 0.0666717
Alpha-diversity of QIIME2 data
## `geom_smooth()` using method = 'loess'

Table 2. Average and standard deviation of alpha-diversity by oxic/anoxic with QIIME2 data
Statistic Oxic Anoxic
Average 4.7959464 3.1546427
Standard deviation 0.4296851 0.1784873

Taxa presence and abundance:
Mothur
31 taxa in the phylum level are detected by mothur pipeline (Figure 3). These taxa, however, have abundance at different magnitudes. Among of them, Proteobacteria is the most predominant phylum in all samples with the highest average abundance over 75. On the contrary, phylum Peregrinlbacteria has abundance no larger than 0.001 in the seven samples (Figure 3). Additionally, different taxa have distinct changes in abundance across depth. For instance, both Thaumarchaeota and Verrucomicrobia reach their maximum abundance at depth of 100m and and gradually decrease in abundance until 200m, while Latescibacteria and Fibrobacteres are almost undetectable in shallow water and their abundances increase dramatically at depth of 200m.

QIIME2
QIIME2 pipeline detects 29 known taxa and unknown taxa in phylum level (Figure 4). Proteobacteria is still the most abundant phylum across samples. Taxa Chlorobi and Candidate division OP3 have the smallest abundance no larger than 0.004. Different pipelines may result in different changes in abundance for the common taxa shared by mothur and QIIME2 data. Although the change patterns of phylum Actinobacteria in abundance across depth are the same in both datasets, Chloroflexi abundance increases gradually with depth in QIIME2 data different from its change pattern in mothur data, in which abundance declines at depth from 100m to 135m and increases gradually until 200m.

  1. Does your taxon of interest significantly differ in abundance with depth and/or oxygen concentration?

Mothur
The difference in cyanobacteria abundance within depth or oxygen was estimated by the linear model using mothur processed data. The statistical results show that abundance of cyanobacteria is significantly different with depth (p = 0.01263) and oxygen (p = 0.00012). However linear models in Figure 5 indicate completely distinct trends of cyanobacteria abundance across depth and oxygen, where there is a decrease in abundance as depth increases and an increase in abundance as oxygen concentrations increases respectively.

QIIME2
For data processed by QIIME2 pipeline, ANOVA tests indicate cyanobacteria abundance in the seven samples has significantly difference across depth (p = 0.014) and oxygen (p = 0.013).The linear models in Figure 6 reveal that cyanobacteria abundance decreases at deeper water or in the environment with insufficient oxygen.

## 
## Call:
## lm(formula = Abundance ~ Depth_m, data = .)
## 
## Residuals:
##       1       2       6       4       5       3       7 
##  46.352 -50.700  20.041 -16.609  -3.284 -39.933  44.132 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 173.5309    39.3958   4.405  0.00699 **
## Depth_m      -1.0883     0.2864  -3.800  0.01263 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.31 on 5 degrees of freedom
## Multiple R-squared:  0.7428, Adjusted R-squared:  0.6914 
## F-statistic: 14.44 on 1 and 5 DF,  p-value: 0.01263
## 
## Call:
## lm(formula = Abundance ~ O2_uM, data = .)
## 
## Residuals:
##       1       2       6       4       5       3       7 
##   6.768 -17.048  19.375  -4.217  12.375 -22.627   5.375 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -5.3745     7.4690   -0.72 0.504007    
## O2_uM         0.9582     0.0885   10.83 0.000117 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.87 on 5 degrees of freedom
## Multiple R-squared:  0.9591, Adjusted R-squared:  0.9509 
## F-statistic: 117.2 on 1 and 5 DF,  p-value: 0.0001167

## 
## Call:
## lm(formula = Abundance ~ Depth_m, data = .)
## 
## Residuals:
##        1        4        5        3        2        6        7 
##   63.270  160.404   72.980  -30.172 -221.273  -44.444   -0.766 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 687.7807   122.7658   5.602   0.0025 **
## Depth_m      -3.3051     0.8925  -3.703   0.0140 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 131.8 on 5 degrees of freedom
## Multiple R-squared:  0.7328, Adjusted R-squared:  0.6794 
## F-statistic: 13.71 on 1 and 5 DF,  p-value: 0.01395
## 
## Call:
## lm(formula = Abundance ~ O2_uM, data = .)
## 
## Residuals:
##         1         4         5         3         2         6         7 
##    0.5171  190.2269  105.9213   18.5371 -121.0450  -61.0787 -133.0787 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 159.0787    57.3201   2.775   0.0391 *
## O2_uM         2.5772     0.6792   3.794   0.0127 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 129.5 on 5 degrees of freedom
## Multiple R-squared:  0.7422, Adjusted R-squared:  0.6907 
## F-statistic:  14.4 on 1 and 5 DF,  p-value: 0.0127

  1. Within your taxon, what is the richness (number of OTUs/ASVs)?

Mothur
Across all samples, there are 15 OTUs within cyanobacteria phylum. Table 3 presents the numbers of OTUs within cyanobacteria phylum for each sample. Most of samples contain 3-5 OTUs within cyanobacteria, except Saanich_120 and Saanich_200 which only have 1 and 0 cyanobacteria OTUs respectively.

QIIME2
There are 51 ASVs within cyanobacteria phylum across all samples. The number of ASVs within cyanobacteria phylum for each sample is shown in Table 2. Saanich_010, Saanich 120, and Saanich_135 have relatively high ASV number equal to or over 15. However, it is important to note that there are no singletons within the ASV dataset, and the function used to estimate richness, “estimate_richness” from the phyloseq library is highly dependent on the number of singletons, and warns of unreliable or wrong results in the absence of singletons in the data.

Table showing richness of OTUs using mothur data and ASVs using QIIME2 data
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 15 taxa and 7 samples ]
## sample_data() Sample Data:       [ 7 samples by 22 sample variables ]
## tax_table()   Taxonomy Table:    [ 15 taxa by 7 taxonomic ranks ]
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 51 taxa and 7 samples ]
## sample_data() Sample Data:       [ 7 samples by 22 sample variables ]
## tax_table()   Taxonomy Table:    [ 51 taxa by 7 taxonomic ranks ]
## Warning in estimate_richness(., measures = c("Observed")): The data you have provided does not have
## any singletons. This is highly suspicious. Results of richness
## estimates (for example) are probably unreliable, or wrong, if you have already
## trimmed low-abundance taxa from the data.
## 
## We recommended that you find the un-trimmed data and retry.
Table 3. OTUs/ASVs across depth
Depth_m OTU ASV
Saanich_010 10 5 17
Saanich_100 100 4 8
Saanich_120 120 1 15
Saanich_135 135 4 17
Saanich_150 150 3 11
Saanich_165 165 3 5
Saanich_200 200 0 2
  1. Do the abundances of OTUs/ASVs within your taxon of interest change significantly with depth and/or oxygen concentration?

Using the linear model for statistical interpretation, after correcting the p-value for multiple comparisons, the abundance of OTUs 0189, 0658, 1104, 3852, and 4312 from the mothur pipeline within the cyanobacteria phylum changed significantly with both depth and oxygen. Interestingly, the abundance of ASVs in the QIIME2 pipeline did not have any significant changes across the depth profiles after correcting for the p-value. However, there were 17 ASVs that had a significant abundance change with respect to oxygen concentrations.

Mothur
Table 4. Correlation data of Cyanobacteria OTUs with significant differences across depth using mothur data
Estimate Std. Error t value P_value Adjusted_P
Otu0189 -0.9610475 0.2669442 -3.600181 0.0155403 0.0491962
Otu0658 -0.0685106 0.0190455 -3.597207 0.0155891 0.0491962
Otu1104 -0.0583306 0.0164341 -3.549356 0.0163987 0.0491962
Otu3852 -0.0159083 0.0044820 -3.549356 0.0163987 0.0491962
Otu4312 -0.0053028 0.0014940 -3.549356 0.0163987 0.0491962

Table 5. Correlation data of Cyanobacteria OTUs with significant differences across oxygen using mothur data
Estimate Std. Error t value P_value Adjusted_P
Otu0189 0.8586534 0.0788686 10.88714 0.0001136 0.0004035
Otu0658 0.0611354 0.0058159 10.51177 0.0001345 0.0004035
Otu1104 0.0522766 0.0049096 10.64789 0.0001264 0.0004035
Otu3852 0.0142572 0.0013390 10.64789 0.0001264 0.0004035
Otu4312 0.0047524 0.0004463 10.64789 0.0001264 0.0004035

QIIME2
## [1] "None of ASV has significantly different abundance acrossing depth with QIIME2 data"

Table 6. Correlation data of Cyanobacteria ASVs with significant differences across oxygen using QIIME2 data
Estimate Std. Error t value P_value Adjusted_P
Asv12 0.0855435 0.0080338 10.647891 0.0001264 0.0004605
Asv144 0.1568297 0.0147287 10.647891 0.0001264 0.0004605
Asv294 0.4104610 0.0429693 9.552421 0.0002128 0.0006784
Asv404 0.1948490 0.0182993 10.647891 0.0001264 0.0004605
Asv663 0.9749372 0.0956990 10.187539 0.0001564 0.0005316
Asv790 0.0095048 0.0008926 10.647891 0.0001264 0.0004605
Asv945 0.0950483 0.0089265 10.647891 0.0001264 0.0004605
Asv1055 0.0380193 0.0035706 10.647891 0.0001264 0.0004605
Asv1085 0.1710870 0.0160677 10.647891 0.0001264 0.0004605
Asv1141 0.0665338 0.0062485 10.647891 0.0001264 0.0004605
Asv1209 0.0285145 0.0026779 10.647891 0.0001264 0.0004605
Asv1454 0.1283152 0.0120508 10.647891 0.0001264 0.0004605
Asv1578 0.0285145 0.0026779 10.647891 0.0001264 0.0004605
Asv1728 0.2281160 0.0214236 10.647891 0.0001264 0.0004605
Asv1817 0.2946498 0.0276721 10.647891 0.0001264 0.0004605
Asv2018 0.3390505 0.0720444 4.706133 0.0053079 0.0159238
Asv2336 0.1045531 0.0098191 10.647891 0.0001264 0.0004605

  1. Are the answers to the above the same using mothur and QIIME2 processed data?

In terms of differences between the two pipelines, the overall trend between both datasets were similar, however the statistical interpretations of the datasets varied. For starters, the Shannon diversity for the whole microbial community processed with mothur was generally smaller than that estimated by QIIME 2 processed data. Additionally, ANOVA tests indicated that the Shannon diversity had no significant change across depth from the mothur pipeline (p = 0.054). In contrast, there was statistical significance with the Shannon diversity across depth from the QIIME2 dataset (p = 0.022). Within the Cyanobacterial taxon, there were 51 ASVs calculated from the QIIME2 pipeline compared to 15 OTUs from the Mothur pipeline. Interestingly, while there is a difference in the richness between both datasets, 5 of the OTUs had a significantly different abundance with respect to depth and oxygen profiles, while none of the ASVs had a significant change in abundance across depth and 17 ASVs which had a significant difference in abundance across oxygen concentrations. Despite all these variations, the Cyanobacterial taxon itself had a significant difference in abundance with respect to depth and oxygen profiles along the water column across both pipelines.

ANOVA on alpha-diversity of mothur data across depth
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Depth_m      1  2.355  2.3554   6.265 0.0543 .
## Residuals    5  1.880  0.3759                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ANOVA on alpha-diversity of QIIME2 data across depth
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Depth_m      1  3.563   3.563   10.65 0.0224 *
## Residuals    5  1.673   0.335                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Discussion

The significant differences for Cyanobacteria abundance across depth and oxygen may be caused by photosynthesis and water temperature. Bacteria of the phylum Cyanobacteria obtain their energy through photosynthesis. These phototrophs are characterized by phycocyanin, a bluish pigment, which functions as an auxiliary light-harvesting protein complex to chlorophyll. Phycocyanin absorbs orange and red light at approximately 620nm and fluoresces at about 650nm depending on the species (10). This is particularly interesting because orange and red light are typically absorbed within the first 50m of a water column (11). From our data in Fig 5 & 6, it was found that there are significant differences in the abundance of Cyanobacteria across the Saanich Inlet depth profiles in both the mothur and QIIME2 pipelines. It is likely that this significance exists as a result of Cyanobacteria thriving at the upper boundaries of the water column, where they are still able to absorb red and orange light that is essential for Cyanobacterial photosynthesis. Furthermore, these findings are supported by the oxygen and fluorescence profiles across the water column (Fig 11). There is a significant difference in the concentration of oxygen and chlorophyll A across depth profiles. Higher concentrations of chlorophyll A and oxygen were found within the top 50m, which indicates increased photosynthetic activity. Moreover, there was a significant difference in the abundance of Cyanobacteria across oxygen concentrations and chlorophyll A concentrations for both pipelines; high oxygen and chlorophyll A concentrations were associated with a larger abundance. With Cyanobacterial photosynthesis contributing a substantial proportion of oxygen to Earth’s atmosphere, it is not surprising that a high abundance of Cyanobacteria is associated with a high concentration of oxygen and chlorophyll A, and a shallow depth. Moreover, it has been reported that the growth rate of cyanobacteria is significantly influenced by temperature. Cyanobacteria were observed to have a lower growth rate at colder temperatures. For marine cyanobacteria, the optimal growth temperature range is 20 - 27.5oC; at these temperatures, cyanobacteria grow at a rate of 0.8 d-1 (12). When the temperature dropped to approximately 15oC, the growth rate of cyanobacteria slowed to 0.22 d-1. Interestingly, when temperature dropped below 10oC, cyanobacterial growth rates came to a complete stop (13). Therefore, in our study, the decline of cyanobacteria abundance with depth may at least partly attributed to the decreasing temperature. According temperature data for each sample, the temperature is close to 13oC at 10m, and decreases to about 9oC when at depths below 100m. Hence, the growth rate of cyanobacteria at depths below 100m is substantially slower than the growth rate at the water’s surface. This leads to a lower abundance at lower depths.

Implications of potential differences in pipelines for microbial ecology make it difficult to make conclusive statements in research and discovery, since we become unable to differentiate actual biological differences seen and differences due to a particular pipeline being used. This could also suggest that one pipeline is more suited to the dataset. In fact, this difference could also be exploited, and manipulated so that a pipeline is selected based on the results that it gives, rather than the more appropriate pipeline for the given dataset.

In the context of this project, the main difference between the two pipelines is based on whether the pipeline produces OTUs (mothur) or ASVs (QIIME2). Both of them use different clustering algorithms to determine “true” sequences, and as a result, there are usually far more ASVs in the QIIME2 pipeline than OTUs created with the mothur pipeline. Therefore, when doing downstream analysis of the pipeline results, this may be one of the reasons why there is a large disparity in between the numbers of sequences and quality of the sequences produced even when using the same initial data.

However, when it came to counting abundances within our taxon, cyanobacteria, there was a far lower abundance seen in qiime2 results than mothur results which cannot be explained by having more numerous ASVs than OTUs. Interestingly, when running the estimate_richness function on our qiime2 data for determining abundance, this resulted in the warning that our data provided did not have any singletons (supposedly in the output), and that results are probably unreliable or wrong. This error did not occur with running this function on mothur data. Further analysis of why this error is seen with qiime2 data should be done before fully trusting the results of this function with qiime2 data.

In subsequent analyses with these two pipelines, perhaps a more in-depth analysis of each function available with the phyloseq package should be tested. A dataset of a well-studied and known community could be used so that the output of the functions can be compared with respect to the pipeline used. Comparisons can be made for the outputs of each pipeline, and results from the mothur and QIIME2 pipelines can be assessed with reference to the expected results. Use cases should also be considered, and standard use cases for either pipeline should be indicated so exploitation of pipelines for favourable results does not occur. Mothur may be more suited to a particular dataset, while QIIME2 could be more appropriate for another dataset that should be used with a “denoising” algorithm.

Future directions for this project could possibly involve the analysis of water samples from other OMZs at various depths, along with observation of the Cyanobacteria data present in those samples to see if the relationships are exhibited between oxygen, nitrogen, and the phyla population as with this study. These can also be analyzed once again with both mothur and QIIME2 for further comparison between the pipelines.

References

  1. Walsh DA, Zaikova E, Howes CG, Song YC, Wright JJ, Tringe SG, Hallam SJ. 2009. Metagenome of a versatile chemolithoautotroph from expanding oceanic dead zones. Science 326(5952): 578-582.
  2. Torres-Beltrán M, Hawley AK, Capelle D, Zaikova E, Walsh DA., Mueller A, Finke J. 2017. A compendium of geochemical information from the Saanich Inlet water column. Nature scientific data 4(170159).
  3. Hawley AK, Torres-Beltrán M, Zaikova E, Walsh DA, Mueller A, Scofield M, Kheirandish S, Payne C, Pakhomova L, Bhatia M, Shevchek O, Gies EA, Fairley D, Malfatii SA, Norbeck AD, Brewer HM, Pasa-Tolic, L, del Rio TG, Suttle CA, Trige S, Hallam SJ. Data Descriptor: A compendium of multi-omic sequence information from the Saanich Inlet water column. Nature scientific data 4(170160).
  4. Hallam SJ, Torres-Beltrán M, Hawley AK. Comment: Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Nature scientific data 4(170158).
  5. National Research Council. 2007. The new science of metagenomics: revealing the secrets of our microbial planet. National Academies Press, Washington, DC.
  6. Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy TBK, Tringe SG. 2017. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nature Biotechnology 35: 725-731.
  7. Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive Earth’s biogeochemical cycles. Science 320(5879): 1034-1039.
  8. Wooley JC, Godzik A, Friedberg I. 2010. A primer on metagenomics. PLoS computational biology 6(2), e1000667.
  9. Callahan BJ, McMurdie PJ, Holmes SP. 2017. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME journal 11(12): 2639-2643.
  10. Simis SG, Huot Y, Babin M, Seppala J, and Metsamaa L. 2012. Optimization of variable fluorescence measurements of phytoplankton communities with cyanobacteria. Photosynthesis research 112(1): 13-30.
  11. Light in the ocean (n.d.) [Online]. Link: https://manoa.hawaii.edu/exploringourfluidearth/physical/ocean-depths/light-ocean.
  12. Boyd PW, RynearsonTA, Armstrong EA, Fu F, Hayashi K, Hu Z, Hutchins DA, Kudela RM, Litchman E, Mulholland MR, Passow U, Strzepek RF, Whittaker KA, Yu E, and Thomas MK. (2013). Marine phytoplankton temperature versus growth responses from polar to tropical waters - outcome of a scientific community-wide study. PLoS ONE 8(5): e63091.
  13. Berg M, Sutula M. 2015. Factors affecting the growth of cyanobacteria with special emphasis on the Sacramento-San Joaquin Delta. Southern California Coastal Water Research Project Technical Report 869.

Evidence worksheet_05 “Extensive mosaic structure”

Part 1: Learning objectives:

Evaluate the concept of microbial species based on environmental surveys and cultivation studies.

Explain the relationship between microdiversity, genomic diversity and metabolic potential

Comment on the forces mediating divergence and cohesion in natural microbial communities

General Questions:

  • What were the main questions being asked?

    • Understanding the pathogenecity and eveolutionary diversity of E. coli
    • How do different E. coli strains differ at genomic scale and at physiological scale?
    • Is there a physiological difference of just a pathological difference?
  • What were the primary methodological approaches used?
    • isolated CFT073
    • PCR and primer walking experiment
    • obtain whole-genome libraries (shotgun and assemble DNA)
    • sequenced random clones
    • Sanger sequencing
    • Annotated genome sequence
    • defined ORFs and Blast predicted proteins
  • Summarize the main results or findings.
    • 12 distinct types of fimbriae gene in genome of CFT073
    • only 39.2% of their combined (nonredundant) set of proteins are common; thus, only 39.2% of genome define them as E. coli; only 3% divergence in the 16S level
    • codon variation; GC content; proximity to tRNA (relevant to gene transfer)
    • difference in disease potential between 0157:H7 and CFT073 is the absence of genes for type III secretion system of phage and plasmid-encoded toxins
    • CFT073 genome rich in genes that encode fimbrial adhesins, autotransporters, iron-sequetration systems and phase-switch recombinases
    • many islands acquired by different horizontal transfer events in each strain
    • 2/3 of island genes shared by EDL933 and CFT073 have unknown functions or are associated with phage or insertion sequence elements
  • Do new questions arise from the results?
    • Why is there limited intragenomic rearrangement in vertical evolution?
    • What kind of body signaling molecules/features allow pathogenic bacteria to colonize at the specific location?
    • Cannot use 16S to fingerprint; why are we sequencing E. coli based on 16S?
  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
    • Fig.1 is quite difficult to interpret due to the vast amount of information provided and requires quite a bit of time to analyze
    • Fig.3 is a bit confusing to understand due to the small markers and black/white scheme

Module_3 essay

Assessing environmental diversity is a complex process. Not all organisms grow under the same growth conditions. Thus, many microbial species are “unculturable” in the laboratory and are difficult to detect without sequence analysis. Metagenomic analysis is a method used to define microbial species based on sequence or functional similarity. The analysis of an environmental sample may be done by the analysis of genome or of conserved genetic regions such as the 16S ribosomal RNA (rRNA) sequence. However, many complications are involved in this process, particularly due to horizontal gene transfer (HGT) that occurs between species. In this essay, we will discuss the challenges that occur when defining species caused by HGT, how HGT maintains global biogeochemical cycles and why it is important to have a clear definition of microbial species.

Defining microbial species is a challenging process due to microbial evolution, particularly as a result of HGT. In metagenomics, mixed reads from different species in an environment are binned into operational taxonomic units (OTUs) based on DNA sequence similarity to a set threshold (1). OTUs correspond to microbial species, which can allow researchers to determine species abundance in an environment. Specific genes can also be binned by their functional characteristics based on functional gene anchors, in order to interpret sets of genes (1). However, the challenge involved in this process is that it is unclear from which species the binned gene originated. Thus, it creates difficulty tracing the origin of a certain gene or metabolic pathway. Distinguishing species becomes more complicated when genes are transferred between microbial species by means of HGT and recombination. Currently, species are broadly defined, in that there is a difference between strains with regards to physiology, genome content and ecology (2). Periodic selection that occurs can cause extinction for an ecotype, while also allowing another ecotype that has obtained a favourable gene by HGT to replace the original strain or species (2). Due to the frequent HGT events, species may switch form a niche to another and acquire metabolic genes required for survival. With the ability to acquire and donate genes, it is challenging to define a microbial species because the characteristics of the species, such as ecology and expressed genes, can frequently change.

HGT influences maintenance of global biogeochemical cycles through time by allowing the dispersal of essential genes required for survival to be transferred to other organisms in the environment. Due to HGT, organisms are able to transfer genes to other related species (3). For example, in the nitrogen (N) cycle, the gene for nitrogenise has been transferred multiple times across taxa throughout history (3, 4). Specifically, the nif gene that is involved in nitrogen fixation has been suggested to be acquired through HGT in the cyanobacterium, Microcoleus chthonoplastes (5). Another example is the sulfur oxygenase reductase (SOR) gene used in the sulfur cycle found in the Sulfolobales taxa (6). Homologs of the gene were found in Acidianus tengchongensis and other microbes, suggesting that the SOR gene was acquired via HGT (6). Although HGT is only common among closely related species and for simple pathways requiring few enzymes, the distribution of these genes can prevent the extinction of these important genes. As previously mentioned, HGT can cause niche differentiation (2). This may help maintain the global biogeochemical cycles by not outcompeting species that can perform an important metabolic reaction of the cycle. Although HGT complicates the process of defining a species, it assists in the maintenance of biogeochemical cycles.

It is necessary to have a clear definition of species because it provides researchers and scientists and agreed upon species name for important purposes such as scientific analysis or clinical diagnosis. With a defined name for a species, important characteristics and information about the species would be universal for researchers and scientists around the world (7). This would allow clinical microbiologists and physicians to integrate findings from research on microbial disease to make consistent clinical diagnosis (7). For example, the species name Salmonella typhi, would give a microbiologist immediate knowledge of the disease they produce, the shape of the bacteria and their natural environment. In addition, scientists would be able to make meaningful comparisons on the microbial diversity of different samples and allow research groups analyzing environmental samples to have less difficulty collaborating due to consistent species naming. For example, in the “Candy” in-class activity in MICB 425, each group named their candy sample differently. This resulted in confusion and some difficulties compiling data to make a comparison between the candy samples due to the inconsistent naming system among multiple independent groups. A standard definition of species would resolve the issue and would also allow for a more organized microbial database. Thus, having a clear definition of a microbial species is necessary to avoid confusion and make research more efficient.

Microbial evolution due to HGT makes analyzing environmental microbial diversity and defining microbial species a complex task for microbiologists. HGT allows the transfer of genes or operons to related species. This complicates the process of defining a microbial species due to the changing characteristics of species caused by gene acquisition and the selective environment. However, HGT can allow the maintenance of global biogeochemical cycles by propagating important genes of the cycles to other species and by niche differentiation to reduce competition between species. Although HGT renders defining species challenging, it is still essential to have a clear definition of species due to communication purposes between and within the scientific and medical community.

Module 03 references

Utilize this space to include a bibliography of any literature you want associated with this module. We recommend keeping this as the final header under each module.

  • Torres-Beltrán, M., Hawley, A. K., Capelle, D., Zaikova, E., Walsh, D. A., Mueller, A., … & Finke, J. (2017). A compendium of geochemical information from the Saanich Inlet water column. Scientific data, 4, 170159. link
  • Sogin, M. L., Morrison, H. G., Huber, J. A., Welch, D. M., Huse, S. M., Neal, P. R., … & Herndl, G. J. (2006). Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proceedings of the National Academy of Sciences, 103(32), 12115-12120. link
  • Welch, R. A., Burland, V., Plunkett, G. I. I. I., Redford, P., Roesch, P., Rasko, D., … & Stroud, D. (2002). Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proceedings of the National Academy of Sciences, 99(26), 17020-17024. link
  • Hawley, A. K., Torres-Beltrán, M., Zaikova, E., Walsh, D. A., Mueller, A., Scofield, M., … & Shevchuk, O. (2017). A compendium of multi-omic sequence information from the Saanich Inlet water column. Scientific data, 4, 170160. link
  • Kunin, V., Engelbrektson, A., Ochman, H., & Hugenholtz, P. (2010). Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental microbiology, 12(1), 118-123. link
  • Hallam, S. J., Torres-Beltrán, M., & Hawley, A. K. (2017). Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Scientific data, 4. link
  • Callahan, B. J., McMurdie, P. J., & Holmes, S. P. (2017). Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME journal, 11(12), 2639. link
  • Thompson, J. R., Pacocha, S., Pharino, C., Klepac-Ceraj, V., Hunt, D. E., Benoit, J., … & Polz, M. F. (2005). Genotypic diversity within a natural coastal bacterioplankton population. Science, 307(5713), 1311-1313. link

References for essay 3:
1. Eisen, J. 2007. Environmental shotgun sequencing: Its potential and challenges for studying the hidden world of microbes. Plos Biology. 5:384-388. doi: 10.1371/journal.pbio.0050082.
2. Wiedenbeck, J, Cohan, FM. 2011. Origins of bacterial diversity through horizontal genetic transfer and adaptation to new ecological niches. FEMS Microbiol. Rev. 35:957-976. doi: 10.1111/j.1574-6976.2011.00292.x.
3. Schimel, J, Schaeffer, S. 2012. Microbial control over carbon cycling in soil. Frontiers in Microbiology. 3:348. doi: 10.3389/fmicb.2012.00348.
4. Falkowski, PG, Fenchel, T, Delong, EF. 2008. The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science. 320:1034-1039. doi: 10.1126/science.1153213.
5. Bolhuis, H, Severin, I, Confurius-Guns, V, Wollenzien, UIA, Stal, LJ. 2010; 2009. Horizontal transfer of the nitrogen fixation gene cluster in the cyanobacterium Microcoleus chthonoplastes. ISME Journal. 4:121-130. doi: 10.1038/ismej.2009.99.
6. Blank, CE. 2012. Low rates of lateral gene transfer among metabolic genes define the evolving biogeochemical niches of Archaea through deep time. Archaea. 2012:23. doi: http://dx.doi.org/10.1155/2012/843539.
7. Baron, EJ. Classification. In: Baron S, editor. Medical Microbiology. 4th edition. Galveston (TX): University of Texas Medical Branch at Galveston; 1996. Chapter 3. Available from: https://www.ncbi.nlm.nih.gov/books/NBK8406/